Search CORE

13 research outputs found

Recent advances in LVCSR : A benchmark comparison of performances

Author: El Hannani Asmaa
Errattahi Rahhal
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2017
Field of study

Large Vocabulary Continuous Speech Recognition (LVCSR), which is characterized by a high variability of the speech, is the most challenging task in automatic speech recognition (ASR). Believing that the evaluation of ASR systems on relevant and common speech corpora is one of the key factors that help accelerating research, we present, in this paper, a benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks. Furthermore, we put objectively into evidence the best performing technologies and the best accuracy achieved so far in each task. The benchmarks have shown that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LVCSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models. They have also shown that despite the satisfying performances in some LVCSR tasks, the problem of large-vocabulary speech recognition is far from being solved in some others, where more research efforts are still needed

IAES journal

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

The Role of Communication Technologies in Building Future Smart Cities

Author: Aqqal Abdelhak
Haidine Abdelfatteh
Hannani Asmaa El
Hassani Sanae El
Publication venue: 'IntechOpen'
Publication date: 07/12/2016
Field of study

The world population is continuously growing and reached a significant evolution of the society, where the number of people living in cities surpassed the number of people in rural areas. This puts national and local governments under pressure because the limited resources, such as water, electricity, and transports, must thus be optimized to cover the needs of the citizens. Therefore, different tools, from sensors to processes, service, and artificial intelligence, are used to coordinate the usage of infrastructures and assets of the cities to build the so called smart cities. Different definitions and theoretical models of smart cities are given in literature. However, smart city can usually be modelled by a layered architecture, where communication and networking layer plays a central role. In fact, smart city applications lay on collecting field data from different infrastructures and assets, processing these data, taking some intelligent control actions, and sharing information in a secure way. Thus, a two way reliable communications layer is the basis of smart cities. This chapter introduces the basic concepts of this field and focuses on the role of communication technologies in smart cities. Potential technologies for smart cities are discussed, especially the recent wireless technologies adapted to smart city requirements

IntechOpen

Crossref

Using data-driven and phonetic units for speaker verification

Author: El Hannani Asmaa
Hennebert Jean
Montero-Asenjo Alberto
Petrovska-Delacrétaz Dijana
Toledano Doroteo T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A. E. Hannani, D. T. Toledano, D. Petrovska-Delacrétaz, A. Montero-Asenjo, J. Hennebert, "Using Data-driven and Phonetic Units for Speaker Verification" in Odyssey: The Speaker and Language Recognition Workshop, San Juan (Puerto Rico), 2006, pp.1 - 6Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approache

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Using data-driven and phonetic units for speaker verication

Author: Alberto Montero-Asenjo
Asmaa El Hannani
Dijana Petrovska-Delacrétaz
Doroteo T Toledano
Jean Hennebert
Publication venue
Publication date: 01/01/2006
Field of study

Abstract Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approaches

CiteSeerX

Real-Time ASR from Meetings

Author: Dines John
El Hannani Asmaa
Garner Philip N.
Hain Thomas
Karafiat Martin
Korchagin Danil
Lincoln Mike
Wan Vincent
Zhang Le
Publication venue
Publication date: 11/02/2010
Field of study

The AMI(DA) system is a meeting room speech recognition system that has been developed and evaluated in the context of the NIST Rich Text (RT) evaluations. Recently, the ``Distant Access'' requirements of the AMIDA project have necessitated that the system operate in real-time. Another more difficult requirement is that the system fit into a live meeting transcription scenario. We describe an infrastructure that has allowed the AMI(DA) system to evolve into one that fulfils these extra requirements. We emphasise the components that address the live and real-time aspects

Infoscience - École polytechnique fédérale de Lausanne

The AMIDA 2009 Meeting Transcription System

Author: Burget Lukas
Dines John
El Hannani Asmaa
Garner Philip N.
Hain Thomas
Huijbregts Marijn
Karafiat Martin
Lincoln Mike
Wan Vincent
Publication venue
Publication date: 26/08/2010
Field of study

We present the AMIDA 2009 system for participation in the NIST RT’2009 STT evaluations. Systems for close-talking, far field and speaker attributed STT conditions are described. Im- provements to our previous systems are: segmentation and diar- isation; stacked bottle-neck posterior feature extraction; fMPE training of acoustic models; adaptation on complete meetings; improvements to WFST decoding; automatic optimisation of decoders and system graphs. Overall these changes gave a 6- 13% relative reduction in word error rate while at the same time reducing the real-time factor by a factor of five and using con- siderably less data for acoustic model training

Infoscience - École polytechnique fédérale de Lausanne

System-independent ASR error detection and classification using Recurrent Neural Network

Author: Amaral
Asmaa EL Hannani
Bell
Deena
Errattahi
Errattahi
Fayolle
Fong
Gibson
Hassan Ouahmane
Jiang
Kemp
Korenevsky
Levin
Mangu
Nair
Nguyen
Ogawa
Pellegrini
Rahhal Errattahi
Rahim
Rudnicky
Rueber
Saz
Seigel
Sukkar
Thomas Hain
Wessel
Wessel
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2019
Field of study

This paper addresses errors in continuous Automatic Speech Recognition (ASR) in two stages: error detection and error type classification. Unlike the majority of research in this field, we propose to handle the recognition errors independently from the ASR decoder. We first establish an effective set of generic features derived exclusively from the recognizer output to compensate for the absence of ASR decoder information. Then, we apply a variant Recurrent Neural Network (V-RNN) based models for error detection and error type classification. Such model learn additional information to the recognized word classification using label dependency. As a result, experiments on Multi-Genre Broadcast Media corpus have shown that the proposed generic features setup leads to achieve competitive performances, compared to state of the art systems in both tasks. Furthermore, we have shown that V-RNN trained on the proposed feature set appear to be an effective classifier for the ASR error detection with an Accuracy of 85.43%

Crossref

White Rose Research Online

Application d’une approche inspirée des colonies de fourmis pour la recommandation des chemins d’apprentissage dans un cours en ligne : modèle et expérience

Author: Abdelhak Aqqal
Asmaa El Hannani
Aziz Dahbi
Najib El Kamoun
Publication venue: 'CRIFPE'
Publication date: 01/01/2014
Field of study

Dans cet article, nous présentons la mise en œuvre, l’expérimentation et l’évaluation d’une approche pour la recommandation des chemins d’apprentissage dans un cours en ligne. Le processus de recommandation est inspiré de l’intelligence en essaim et plus particulièrement de l’optimisation par colonies de fourmis (OCF) (ant colony optimization [ACO]). Dans ce contexte, nous avons considéré une différenciation des chemins d’apprentissage en fonction de l’activité explorée pour l’apprentissage d’un cours. Dans l’objectif de recommander des chemins d’apprentissage considérés optimaux et d’évaluer ainsi leur impact sur l’apprentissage d’un cours en ligne, l’approche proposée est basée à la fois sur la recommandation de chemins pertinents par l’enseignant et sur les résultats stockés au fur et à mesure par les apprenants sur les chemins empruntés. Notre approche a été validée expérimentalement et les résultats obtenus ont montré l’émergence d’un chemin d’apprentissage favorisant la réussite d’un nombre d’apprenants relativement considérable

Directory of Open Access Journals

Érudit

Real-time ASR from meetings

Author: Asmaa El Hannani
John Dines
Philip N. Garner
Thomas Hain
Publication venue
Publication date: 01/01/2009
Field of study

The AMI(DA) system is a meeting room speech recognition system that has been developed and evaluated in the context of the NIST Rich Text (RT) evaluations. Recently, the “Distant Access ” requirements of the AMIDA project have necessitated that the system operate in real-time. Another more difficult requirement is that the system fit into a live meeting transcription scenario. We describe an infrastructure that has allowed the AMI(DA) system to evolve into one that fulfils these extra requirements. We emphasise the components that address the live and real-time aspects. Index Terms: real-time speech recognition, meeting ASR, beam-forming, speech meta-data. 1

CiteSeerX